-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
use ccalendar instead of np_datetime #21549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
what does the generated code look like? (for that function) |
@@ -437,12 +437,6 @@ class BaseOffset(_BaseOffset): | |||
# ---------------------------------------------------------------------- | |||
# RelativeDelta Arithmetic | |||
|
|||
@cython.wraparound(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use days_per_month_array
in ccalendar.pyx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use
days_per_month_array
in ccalendar.pyx?
We could import it from the C file, but we can't define it directly in cython. Since part of the point of ccalendar was to move to pure-cython, I'm not sure we want to re-introduce the C dependency. Or if we do, we might as well keep the entire implementation in the C file.
Cython 0.28 supposedly allows for "verbatim C" in the form:
cdef extern from *:
"""
foo;
bar;
"""
which could be useful for small snipped. But when I've tried this I get compile-time errors.
After stripping out a bunch of comments:
|
I think this will all get inlined so performance shouldn't differ anyways, but FYI you can equivalently define/import # ccalender.pyx
cdef int* days_per_month_array
#tslib.pyx
from ccalender cimport days_per_month_array |
@chris-b1 I'm not sure we're on the same page as to the goal/problem.
|
Thanks for the color, I was attempting to say that I think this part is false, or at least seemed to work with what I showed above
Interesting (surprising) that performance is different, I will look closer |
At least with # master
big_ts = pd.date_range('1999-01-01', periods=1_000_000, freq='H')
%timeit big_ts.daysinmonth
91.6 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# patch
big_ts = pd.date_range('1999-01-01', periods=1_000_000, freq='H')
%timeit big_ts.daysinmonth
94.7 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) patch diff --git a/pandas/_libs/tslibs/fields.pyx b/pandas/_libs/tslibs/fields.pyx
index ccf67e765..be53dfe8b 100644
--- a/pandas/_libs/tslibs/fields.pyx
+++ b/pandas/_libs/tslibs/fields.pyx
@@ -14,13 +14,19 @@ from numpy cimport ndarray, int64_t, int32_t, int8_t
cnp.import_array()
from ccalendar import get_locale_names, MONTHS_FULL, DAYS_FULL
-from ccalendar cimport (get_days_in_month, is_leapyear, dayofweek,
+from ccalendar cimport (is_leapyear, dayofweek,
get_week_of_year, get_day_of_year)
from np_datetime cimport (pandas_datetimestruct, pandas_timedeltastruct,
+ days_per_month_table,
dt64_to_dtstruct, td64_to_tdstruct)
from nattype cimport NPY_NAT
+@cython.wraparound(False)
+@cython.boundscheck(False)
+cdef inline int get_days_in_month(int year, int month) nogil:
+ return days_per_month_table[is_leapyear(year)][month - 1]
+
def get_time_micros(ndarray[int64_t] dtindex):
"""
Return the number of microseconds in the time component of a |
Codecov Report
@@ Coverage Diff @@
## master #21549 +/- ##
=======================================
Coverage 91.9% 91.9%
=======================================
Files 153 153
Lines 49547 49547
=======================================
Hits 45537 45537
Misses 4010 4010
Continue to review full report at Codecov.
|
@jbrockmendel status of this? (as the DO NOT MERGE at the top is clear!) |
If I understand @chris-b1 correctly then the perf discrepancy may not be an issue, in which case this is a clear (albeit small) improvement, so merge is OK. |
thanks @jbrockmendel |
Do Not Merge
AFAICT the np_datetime.c versions of these functions are noticeably more performant than the cython versions. I could use help verifying this observation since asv doesn't work well for me.
Best guess is that lookups in np_datetime.c's
days_per_month_table[2][12]
are more efficient than lookups inccalendar.days_per_month_array
, as there does not appear to be a way to instantiate the C data structure in cython (at least not as a module-global).